bootstrap method
Deep Bootstrap
Chang, Jinyuan, Jiao, Yuling, Kang, Lican, Shi, Junjie
As a result, the demands for interval estimation, and consequently for its validity and precision, have experienced a sustained increase over time and are reflected in a number of recent studies. For example, in proteomics, confidence intervals are employed to assess the association between post-translational modifications and intrinsically disordered regions of proteins, validating hypotheses derived from predictive models and facilitating large-scale functional analyses (Tunyasuvunakool et al., 2021; Bludau et al., 2022). In genomic research, confidence intervals are leveraged to characterize the distribution of gene expression levels, enabling robust inferences about promoter sequence effects and genetic variability (Vaishnav et al., 2022). In the realm of environmental science, interval estimation can be used to monitor deforestation rates of forests, yielding uncertainty-aware insights critical for climate policy formulation (Bullock et al., 2020). As for social sciences, confidence intervals are utilized to evaluate relationships between socioeconomic factors, bolstering the robustness of conclusions drawn from census data (Ding et al., 2021).
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection
One of the most commonly used methods for forming confidence intervals is the empirical bootstrap, which is especially expedient when the limiting distribution of the estimator is unknown. However, despite its ubiquitous role in machine learning, its theoretical properties are still not well understood. Recent developments in probability have provided new tools to study the bootstrap method. However, they have been applied only to specific applications and contexts, and it is unclear whether these techniques are applicable to the understanding of the consistency of the bootstrap in machine learning pipelines. In this paper, we derive general stability conditions under which the empirical bootstrap estimator is consistent and quantify the speed of convergence. Moreover, we propose alternative ways to use the bootstrap method to build confidence intervals with coverage guarantees. Finally, we illustrate the generality and tightness of our results by examples of interest for machine learning including for two-sample kernel tests after kernel selection and the empirical risk of stacked estimators.
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
A Wild Bootstrap for Degenerate Kernel Tests
A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.
Generalized Bayesian Ensemble Survival Tree (GBEST) model
Ballante, Elena, Muliere, Pietro, Figini, Silvia
This paper proposes a new class of predictive models for survival analysis called Generalized Bayesian Ensemble Survival Tree (GBEST). It is well known that survival analysis poses many different challenges, in particular when applied to small data or censorship mechanism. Our contribution is the proposal of an ensemble approach that uses Bayesian bootstrap and beta Stacy bootstrap methods to improve the outcome in survival application with a special focus on small datasets. More precisely, a novel approach to integrate Beta Stacy Bayesian bootstrap in bagging tree models for censored data is proposed in this paper. Empirical evidence achieved on simulated and real data underlines that our approach performs better in terms of predictive performances and stability of the results compared with classical survival models available in the literature. In terms of methodology our novel contribution considers the adaptation of recent Bayesian ensemble approaches to survival data, providing a new model called Generalized Bayesian Ensemble Survival Tree (GBEST). A further result in terms of computational novelty is the implementation in R of GBEST, available in a public GitHub repository.
- North America > United States > New York (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Law > Civil Rights & Constitutional Law (0.75)
- Education > Curriculum > Subject-Specific Education (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
A Wild Bootstrap for Degenerate Kernel Tests
A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.
Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection
One of the most commonly used methods for forming confidence intervals is the empirical bootstrap, which is especially expedient when the limiting distribution of the estimator is unknown. However, despite its ubiquitous role in machine learning, its theoretical properties are still not well understood. Recent developments in probability have provided new tools to study the bootstrap method. However, they have been applied only to specific applications and contexts, and it is unclear whether these techniques are applicable to the understanding of the consistency of the bootstrap in machine learning pipelines. In this paper, we derive general stability conditions under which the empirical bootstrap estimator is consistent and quantify the speed of convergence.